Exploiting Syntactic and Distributional Information for Spelling Correction with Web-Scale N-gram Models

نویسندگان

  • Wei Xu
  • Joel R. Tetreault
  • Martin Chodorow
  • Ralph Grishman
  • Le Zhao
چکیده

We propose a novel way of incorporating dependency parse and word co-occurrence information into a state-of-the-art web-scale ngram model for spelling correction. The syntactic and distributional information provides extra evidence in addition to that provided by a web-scale n-gram corpus and especially helps with data sparsity problems. Experimental results show that introducing syntactic features into n-gram based models significantly reduces errors by up to 12.4% over the current state-of-the-art. The word co-occurrence information shows potential but only improves overall accuracy slightly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web-Scale N-gram Models for Lexical Disambiguation

Web-scale data has been used in a diverse range of language research. Most of this research has used web counts for only short, fixed spans of context. We present a unified view of using web counts for lexical disambiguation. Unlike previous approaches, our supervised and unsupervised systems combine information from multiple and overlapping segments of context. On the tasks of preposition sele...

متن کامل

CloudSpeller: Spelling Correction for Search Queries by Using a Unified Hidden Markov Model with Web-scale Resources

Query spelling correction is a crucial component of moden search engines that can help users to express an information need more accurately and thus improve search quality. In participation of the Microsoft Speller Challenge, we proposed and implemented an efficient end-to-end speller correction system, namely CloudSpeller. The CloudSpeller system uses a Hidden Markov model to effectively model...

متن کامل

Exploring Distributional Similarity Based Models for Query Spelling Correction

A query speller is crucial to search engine in improving web search relevance. This paper describes novel methods for use of distributional similarity estimated from query logs in learning improved query spelling correction models. The key to our methods is the property of distributional similarity between two terms: it is high between a frequently occurring misspelling and its correction, and ...

متن کامل

A Comparative Study of Bing Web N-gram Language Models for Web Search and Natural Language Processing

This paper presents a comparative study of the recently released Microsoft Web N-gram Language Models (MWNLM) on three web search and natural language processing tasks: search query spelling correction, query reformulation, and statistical machine translation. MWNLM, as well as the corresponding web services, called Microsoft Web N-gram Services, are much more accessible and easier to use than ...

متن کامل

Smoothing issues in the structured language model

The Structured Language Model (SLM) recently introduced by Chelba and Jelinek is a powerful general formalism for exploiting syntactic dependencies in a left-to-right language model for applications such as speech and handwriting recognition, spelling correction, machine translation, etc. Unlike traditional N-gram models, optimal smoothing techniques – discounting methods and hierarchical struc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011